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Abstract. For a given set of intervals on the real line, we consider the problem of 
ordering the intervals with the goal of minimizing an objective function that depends on 
the exposed interval pieces (that is, the pieces that are not covered by earlier intervals 
in the ordering). This problem is motivated by an application in molecular biology that 
concerns the determination of the structure of the backbone of a protein. 

We present polynomial-time algorithms for several natural special cases of the prob- 
lem that cover the situation where the interval boundaries are agreeably ordered and the 
situation where the interval set is laminar. Also the bottleneck variant of the problem is 
shown to be solvable in polynomial time. Finally we prove that the general problem is 
NP-hard, and that the existence of a constant-factor-approximation algorithm is unlikely. 

Keywords: dynamic programming; bottleneck problem; NP-hard; exposed part; agree- 
able intervals; laminar intervals. 



1 Introduction 

Let us consider a set X of n intervals Ij = [aj, bj) for j = 1,2, ... ,n on the real line. The 
length of interval Ij is denoted by \Ij\ = bj — aj. As usual, the length of a union of disjoint 
intervals is the sum of the lengths of the individual intervals. For an interval Ij and a subset 
iS C X of the intervals, we define Ij \ lj/e5 ^ to be that part of interval Ij that is not covered 
by the union of the intervals in S; throughout this text this uncovered part will be called 
the exposed part of Ij relative to subset S. Notice that the exposed part depends upon S 
and in general need not be an interval. (If the intervals in X are pairwise disjoint, then of 
course the exposed part of any interval / relative to any set S of intervals not containing / 
is the interval / itself.) 

We investigate an interval ordering problem that is built around a cost function / that 
assigns to every interval of length p a corresponding real cost f{p). The cost of a set S of 
pairwise disjoint intervals is the sum of the costs of the individual intervals in S. The cost of 
an ordering a = (a(l), «(2), . . . , a{n)) of all n intervals is the result of summing up in that 
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order, for every interval, the cost of its exposed part with respect to the previous intervals. 
Formally, the problem is defined as follows. 

Definition 1. The Interval Ordering Problem: Given a function / : M — )■ M and n 

intervals Ji, . . . , over the real line, find an ordering a G S„ such that the cost 

n 

^f{\Ia{k) \ |Ji=l ^"0)1)' 
fc=l 

is minimized, where S„ denotes the set of all the permutations of {1, 2, . . . , ra}. 

Observe that the interval ordering problem becomes trivial, if all intervals are pairwise 
disjoint (since then all orderings yield the same cost). In the rest of this paper, an instance 
of the interval ordering problem is represented by (X, /) where X is the set of intervals and 
/ is the cost function. 

Example 1. Consider the instance that consists of the five intervals Ii = [0, 1), I2 = [1, 2), 
J3 = [2,3), I4 = [3,6) and = [0,5), and the cost function f{x) = 2^'. An optimal solution 
for this instance is given by the sequence a = (1,2,3,5,4) with a total cost of 12. 




greedy ordering with respect to interval length optimal ordering 



This example illustrates that in general an optimal solution will not sequence the intervals 
in order of increasing length (and it can be verified that in Example [l] no such sequence 
can yield the optimal objective value). The next example illustrates that also the following 
natural greedy algorithm fails: "Always select the interval with the smallest exposed part 
relative to the intervals sequenced so far". In fact, the greedy algorithm can be arbitrarily 
bad, as witnessed by the following example. 

Example 2. Consider a family of instances, where each instance consists of2k~l intervals: 
Ai = [0, 2k), A2 = [2k - e, Ak), A3 = [Ak - e, 6k),. . . , Ak = [2k{k - 1) - e, 2^), Bi = 
[k - e, 2A;2), B2 = [3k - e, 2P), B3 = [5k - e, 2k'^),. . . , B^-i = [2^ -3k- e, 2k'^), for some 
constants k, e> with the cost function f{x) = 2^. 




greedy ordering with respect to exposed length optimal ordering 



A greedy sequence is {Ai , A2, . . . , A^-i, Ak, Bk-i, Bk-2, ■ ■ ■ , Bi) and achieves a cost of 
k2'^^ + k — 1, whereas the optimal solution is {A^, -Bfc_i, A^^i, Bk-2, ■ ■ ■ , ^2, Bi, Ai) and 
has the cost o/2^^+'^ + (2A; — 3)2^^ + 2*^"^. The ratio between both costs can be made arbitrarily 
large, by choosing appropriate k and small e > 0. 
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The contributions of this paper are twofold: on the positive side, we describe polynomial- 
time algorithms for some natural and fairly general special cases of the problem. On 
the negative side, we establish the computational complexity (NP-hardness) and the in- 
approximability of the problem. 

The paper is organized as follows. In Section [2| we describe the motivating real world 
application (in molecular biology) that stands behind the interval ordering problem. In 
Section |3j we formulate and present a number of special cases of the problem that can 
be solved in polynomial time. In Section |4| we present complexity and in-approximability 
results. We conclude in Section [H 

2 Motivation 

The interval ordering problem studied in this paper is motivated by a special case of the 
so-called distance geometry problem [l]^- Formally, an instance of the latter consists of an 
undirected graph G{V,E) with positive edge weights d : E ^ IR+. The goal is to find an 
embedding of the vertices into some Euclidian space, say p : V — )■ M^, satisfying the requested 
distances, i.e. for every edge {u,v), we must have \\p{u) — p{v)\\ = d{u,v). This problem 
appears in the areas of graph drawing, localizing wireless sensors, and also in protein folding 
as we now explain. 

The protein folding problem consists of computing the spatial structure of a protein. To 
simplify the notations, we restrict the problem to the 2-dimensional space; this does not 
alter its essence. A protein is a huge molecule consisting of many different atoms linked 
together. Consider a simplified version of this problem where we only want to determine 
the structure of the backbone of the protein, that is, we are interested in determining the 
position of the main string of atoms. The exact sequence of atoms is known, and different 
approaches are being used in practice to determine their spatial structure. One possibility 
is to use Nano Magnetic Resonance (NMR) to determine the distances between some pairs 
of atoms. The goal is then to reconstruct a folding that matches the measured distances. 
This problem, also called ?>- dimensional discretizable molecular distance geometry problem, 
is NP-hard (see |2]), and different algorithms have been proposed for it; we refer to ^ for a 
recent overview. 

Formally in the problem of reconstructing the backbone of a protein, we are given a vertex 
set V = {1,2, .. . ,m}, enumerating all the atoms of the backbone, together with distances 
d{i,j) for some pairs i,j G V. It is a common assumption that all d{i,j) with i + 1 < j < i + 2 
are given and \d{i,i + 1) — d{i + 1, i + 2)\ < d{i,i -|- 2) < d{i,i + 1) + d{i + l,i + 2) for all 
i = 1, ... ,m — 2. The first assumption is motivated by the fact that the NMR reveals 
distances between atoms which are close to each other. The second assumption is motivated 
by the chemical fact that in general atoms in molecules are not in co-linear positions. We 
call d{i,j) a short range distance i{i + l<j<i + 2, and a long range distance otherwise. 

These assumptions give the problem a combinatorial structure. By translation invariance, 
without loss of generality we can place vertex 1 in the origin (0, 0). By rotation invariance, 
without loss of generality we can place vertex 2 in (c/(l,2),0). Now for vertex 3 there are 
only two positions respecting the distance d{l,3), which are the two intersection points of 
the circle of radius d{l,3) centered at (0,0), and the circle of radius d{2,3) centered at 
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((i(l,2),0). In a similar manner, there are exactly two possible positions of vertex i + 2 
relative to the segment between vertices i and i + 1. Therefore with fixed positions for 
vertices 1 and 2, there are exactly 2™"^ embeddings satisfying the short range distances. 
We could describe each embedding by a binary string x^, X4, . . . , Xm, where bit is 1 if and 
only if the triangle formed by vertices i — 2,i — l,i is oriented clockwise. But in order to 
circumvent the symmetry inherent to this problem, we describe each embedding by a binary 
string ?/3, 1/4, . . . , where i/j — X3 © X4 (B . . . (B Xj . See Figure [T] for illustration. 



Figure 1: All 8 possible embeddings of a 5 vertex instance. The embedding 
described by the bit-string 000 is depicted with solid lines. 

Now every long range distance d{i,j) implies a constraint on the unknown binary string 
y. It enforces the bits yi+3, yi+4, ■ ■ ■ ,yj to those positions that yield an embedding such 
that atoms i and j are at the right distance. The problem now is to find, as efficiently as 
possible, values for the bits satisfying all measured distance constraints. Let us now state 
some notation to arrive at a formal definition of the problem. 

Notation If a > 6 then [a,b] is the empty interval. For any a < 6 by {0, l}''*'^^ we 
denote the set of all bit strings of length b — a + 1 indexed from a to b. If [a,b] C [c, d] and 
y G {0, then we denote by y[a, b] the restriction of y to the indices from a to b. We use 
{0, l}*" as a shortcut notation for {0, 

Definition 2. The BitString- Reconstruction Problem (BSRP): We are given an in- 
teger m, and n triplets {ai,bi,Ti) where 1 < ai < bi < m, Ti : {0, l}''*''^'' — )■ {0,1}. The 
function Ti is an oracle that returns 1 at a single element of the domain. The goal of the 
BSRP is to find a hit string y G {0, l}"*, such that for alii = 1, ... ,n we have Ti{y[ai, bi]) = 1. 

The idea is that a triplet in BSRP corresponds to a given distance between atoms i and 
j with z + 3 < j in the folding problem. Formally, a triplet is defined hj {a = i,b = j,T) 
where T is the boolean function, that accepts a bit string z if and only ii z = y[a,b] for 
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every bit string y G {0, 1}™ describing an embedding where i and j are at the given distance 
d{i,j). At this point we assume that there is a unique bit string z with this property. In 
two dimensions, this is equivalent to fixing the position of the third vertex and in three 
dimensions this boils down to fixing the fourth vertex as well. Already with this strong 
simplification, we are facing a non-trivial and interesting algorithmic problem. 

A straightforward algorithm to solve BSRP employs a brute force approach (see jl] for a 
similar method called Branch-and-Prune algorithm): by letting ^ be a symbol representing 
an unspecified bit, the idea of brute- force search is to start with a completely unspecified 
string y = ^" G {0, 1,^}", and to refine it using the distances between atoms i and j with 
|i — j| > 3. More precisely: 

Algorithm 1 The BruteForce search algorithm 
1: for i = 1, . . . ,n do 

2: Let w = y[ai, bi] and let ii be the number of unspecified bits in w 

3: Search for z such that Tj[z] = 1, ranging over all 2^' different replacements of in 

4: if found then 

5: replace, in y, y[ai,bi] by z 

6: else 

7: exit and announce that there is no solution 
8: end if 
9: end for 

10: Return y, replacing all remaining occurrences of ^ by an arbitrary bit 



The running time of the BruteForce search algorithm is 0( Yl^=i 2^') and it depends on 
the order in which the triplets in the instance are presented to the algorithm. The only 
remaining question is in which order to process the given distances. In fact, it is our goal 
to find an order for the triplets in the instance of the BSRP to be passed to the BruteForce 
search algorithm in order to minimize the running time. This leads to the interval ordering 
problem that was described in the introduction with the following additional structure: (i) 
all data are integral, and (ii) the cost function / is given by /(x) = 2^. 

We should point out here that the protein folding application does not necessarily give 
rise to instances that display the special structures that we will discuss in Section |3j agreeable 
intervals, and laminar intervals. 

3 Some polynomial time solvable cases 

In this section, we study some special cases of the interval ordering problem that can be 
solved in polynomial time. We first consider the case where the intervals are agreeable. 
We derive an O(n^) dynamic programming algorithm for solving this special case for any 
cost function /. When the cost function is continuous and convex, we propose a dynamic 
programming algorithm with time complexity O(n^). Next, we consider the case where the 
intervals are laminar and describe polynomial-time algorithms for solving the problem when 
the cost function / is such that the function g{x) = f{x) — /(O) is either super-additive 
or sub-additive. Finally, we study the bottleneck variant of the interval ordering problem 
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and show that it can be solved in polynomial time when the cost function / is either non- 
decreasing or non-increasing. 

3.1 Agreeable intervals 

We say that a set X of n intervals Jj = [a^, bi), for i = 1, 2, . . . , n is agreeable if there exists 
a permutation 7 of {l,...,n} such that a^(i) < ... < a^(n) and 6^(1) < ... < ^^(n)- In 
other words, the ordering of the intervals induced by the left endpoints is the same as the 
ordering induced by the right endpoints. For ease of exposition, we will assume in the 
rest of this section that 7 is the permutation identity: thus we have ai < . . . < a„ and 
bi < . . . < bn- We can assume that two consecutive intervals Jj and Jj+i overlap (that is 
ttj+i < bi) because otherwise this would split the problem into two sub-problems that can be 
solved independently. In what follows, we first consider the general case with an arbitrary 
cost function /, followed by a special case where the cost function / is continuous and convex. 

3.1.1 Arbitrary cost function 

In this section, we consider instances (X, /) of the interval ordering problem with X agree- 
able and / arbitrary. Observe that in the case of agreeable intervals, after selecting the 
first interval, the problem decomposes into (at most) two unrelated instances that are each 
agreeable; we will use this property to derive a dynamic programming algorithm. 

For a formal definition of the decomposition, consider the set X = . . . ,/„} of 

agreeable intervals. We consider the exposed parts of each of these intervals with respect 
to {Ij}, I < j < n. Since X is agreeable, the exposed parts are again intervals, and we 
distinguish between those before Ij and those after Ij. 




Figure 2: The subinstance Xj fc. 



For convenience define 60 = 0.1 and a^+i = 6„. For any pair of indices Q < i,k < n + 1 
we define the subinstance Xi^k '■= {Ij H [bi, au) : i < j < k}. Notice that if bi > ak, then li^k 
consists of k — i — 1 intervals of zero length. Let C{i, k) be the cost of an optimum solution 
to (Xj fc, /), with C(i, /c) = if Xj fc = 0. We have the following recursion. 
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Lemma 1. For 0<i<k<n + l we have C{i, k) = in case i + 1 = k, and otherwise 
C{i, k) = min {C{i,3) + /(|/, H [6„ a^)!) + k)} . 

i<j<k 

Proof: The case i + 1 = k follows from Xj j+i = {} and the remaining case follows from the 
fact that (1) some interval Ij has to be selected first, and (2) after selecting that interval the 
problem decomposes into two unrelated instances, Xj j and X,- j^, each being agreeable. □ 

Theorem 1. The interval ordering problem (X, /) with Z agreeable and f arbitrary, can be 
solved in 0{n^). 

Proof: Lemma [l] leads to a dynamic programming algorithm with Oin^) variables, each 
computable in linear time. n 



3.1.2 Continuous and convex cost function 

In this subsection, we still assume that the intervals in X are agreeable, but we consider the 
cost function / to be continuous and convex. Recall that a function / defined on a convex 
set dom(/) is convex when /(Ax + (1 — X)y) < A/(x) + (1 — A)/(?/) for all x,y E dom(/), 
and < A < 1. We need the following result, due to Karamata |^ (see also pages 30-32 in 
Beckenbach and Bellman |6]). 

Lemma 2. Given 2q + 2 numbers {xk, yt}, k = 0,1, . . . ,q satisfying: 

• xo>xi>...>Xg, and yo > yi > . . . > yq, 

• for each k = 0,1, . . . ,q - I: XlLo ^ Yli=o Vi' ^^^^ 

then, for any continuous, convex function f we have: 

Y.f{^i)>Y.f{yi)- (1) 

Let (X, /) be an instance of the interval ordering problem where X is agreeable and 
contains n intervals Jj = [ai,hi), i = 1, . . . ,n and / is continuous and convex. For a given 
solution to (X, /) (i.e., a sequence of intervals), we call an interval Jj an E-interval if is 
contained in the exposed part of interval relative to the set of intervals sequenced before Jj 
(in that solution). Given an integer k, 1 < k < n, let X^ be the set containing the intervals 
li = [oj, hi) for i = k, . . . ,n and let Ck be the value of an optimal solution to the instance 
(Xfc, /). Notice that this definition implies that X = Xi. Further, interval Ik is an ii^- interval 
in any feasible solution to (X^, /). 

Lemma 3. Let (X, /) be an instance of the interval ordering problem with X agreeable and 
f continuous and convex; and let X^, defined as above. 

If, in an optimal solution to (l^, f) , interval Ik is the only E-interval, then 

n 

Ck = f{hk-ak)+ f{^r-h-i). (2) 

i=k+l 
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Otherwise, in this optimal solution to [Xk, f) , let Ij with j > k be the first E-interval, i.e., 
the E-interval with minimal aj. 
If o,j ^ bk then j = k + 1 and 

Ck = /(ofc+i — flfc) + Ck+i- (3) 

// aj > bk, then 

Ck = f{bk -ak)+ Yl /(^* - ^-0 + - + 0-^- l)/(0) + (4) 

i=k+l 

where Ii is the latest interval in Xk that satisfies hi < aj . 

Proof: We will show that if interval Ik is the only i?-interval, then the optimal sequence to 
(Xfc, /) is simply (/c, + 1, . . . , n). Otherwise, if there is another E'-interval /j, where Ij is 
the first ii^- interval with j > k, then the optimal sequence to (X^, /) is the sequence of the 
solution to (Xj, /) followed by /c, /c + 1, . . . , j — 1. See Figure ^for illustration. 




Figure 3: Recurrence relation of Ck- If Ij is the first E-interval after Ik, then the cost 
divides into the cost of the intervals between k and j, plus Cj 



Case 1: Interval Ik is the only E'-interval. We show that oq = {k,k + 1, ... ,n) is an 
optimal sequence to (Zk,f). The sequence ao partitions [afc,6„) into n — k + 1 nonempty 
segments, defined by 5*0 = [ak,bk), Si = [bk+i-i,bk+i) for i = 1, . . . ,n — k (this is true 
since the intervals are agreeable). Let a be a permutation of {0,1, . . . ,n — k} such that 
|'S'o-(i)| > |5'o-(i+i)| for i = 0, 1, . . . , n — — 1; the permutation a orders the segments induced 
by ao in non-increasing length. Now, let a be some sequence of intervals {a ^ ao) which 
does not feature another i?-interval apart from interval Ik. Clearly, a partitions [ofc, &«) into 
less than n — k + 1 nonempty segments, each segment being defined by a pair from the set 
{ak, bk, fefc+i, . . . ,bn} (indeed, notice that the only way to have n — k + 1 segments is when 
a = ao). Let us suppose that a partitions [a^., 6„) into p + 1 segments {l<p<n — k — 1) 
S'q, . . . ,S'p satisfying |S'q| > > . . . > 15*^1. For convenience set S'p_^_^ = . . . = S'^ = {}. 
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Observation: Any segment 5*^' [i = 0,1, ... ,p) is eitlier identical to a segment Sj for a 
given j G {0,1, . . . ,n — k} or is a union of consecutive intervals Sj. 

This observation follows from the fact that the segments are defined by points in the set 
{ttk bk, bk+i, . . . , bn}. We will use this observation to argue that for each m = 0,1, . . . ,n — 
k + 1: 

m m 
i=0 i=0 

For any m with p < m < n — k, we have that X^I^o l*^*'! — Xli^o l'^o-(«)l- This is because 
U^qSI = [ak,bn) and U^o'S'o-(i) ^ [afc,6„) and the segments are disjoint. 

We now show that ([s]) is also true for m < p, by induction on m. For the base case, note 
that the the observation above immediately implies that I^qI > |5'o-(o)|- For the induction 
step, we assume that YliLo — Yl^o \S<T(i)\- The question now is whether 

m+1 m+1 

J2\Sl\>Y,\S.(^)\ (6) 

is true. Let us consider ^^(m+i). If each ^^-(r) with r < m is contained in the left-hand 
side of then, using the induction hypothesis (^™q I^^'I > J2^o \Sa-(i)\), the validity of 
(|6| follows. Indeed, if Sa-^m+i) is also contained in the left-hand side of ([6]), the inequality 
is certainly valid, else we know that 5*^+1 > Sa{m+i)- If there exists an Sa{r) with r < m 
not contained in the left-hand side of then: S'^+i > S'o-(r) > 5'o-(m+i) (where the first 
inequality holds because the length of a segment S'o-(j) not contained in the left-hand side of 
(|6| is a lower bound for Sj). This completes the proof of (|5|. 

We now invoke Lemma [2] by setting q := n — k + 1, and for i = 0, 1, . . . , n — A; + 1 we 
set Xi := 15*^1, Hi := l^il. Clearly, the arguments given above imply that the conditions of 
Lemma |2] are satisfied. Hence, when / is continuous and convex, the cost of a is greater 
than or equal to the cost of ao- 

Case 2: There is another i?-interval Ij, where Ij is the first .E-interval after Ik. 

For this case, we use the following observation. Let Ip and Iq be two consecutive intervals 
in a solution, and suppose that they are disjoint. Then it does not matter for the cost of 
the solution whether Ip or Ig is processed first of the two. Now since Ij is an i?-interval, it 
must be processed before all intervals li that contain aj (otherwise Ij is not an i?-interval), 
and it can be processed before all intervals with bi < aj. Thus, we conclude that Ij is 
processed before intervals indexed hj k, k + 1, . . . , j — 1. Since the intervals are agreeable, the 
exposed parts (after processing Ij) of the intervals before Ij are disjoint with the intervals 
with index greater than j. Therefore we can assume that the intervals with index k, . . . , j — 1 
are processed after the intervals with index j, . . . ,n. And of course, the latter intervals are 
processed optimally by a sequence of the solution to (Zj,f). Let Ii with i < j he the 
latest interval that does not intersect interval Ij. Notice that by the choice of j, the optimal 
sequence of the intervals I^,. . . ,I(, contains only one i?-interval, namely J^. Hence, that 
optimal sequence has a cost of f{bk — a^) + Yli=k+i f(^i ~ ^j-i)- Finally, we need to take 
into account the intervals Ii+i, . . . , Ij-i. Thus, we incur /(oj — 6^) for the exposed part 
between 6^ and aj, corresponding to interval h+i, and we incur a cost of /(O) for each of the 
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remaining intervals. Notice that all intervals Z^+i, . . . , Ij-i are completely covered in this 
sequence. This completes the proof of this lemma. □ 



Theorem 2. The interval ordering problem (X, /) with X agreeable and f convex and con- 
tinuous, can be solved in 0{n'^). 

Proof: The 0(n^)-time complexity of the dynamic program following from Lemma [s] (see 
equations ^ and (|3|) is explained by the fact that there are n variables and each is a 
minimization over 0{n) values. □ 

3.2 Laminar intervals 

Let (X, /) be an instance of the interval ordering problem where X contains n intervals 
Jj = [ttj, for i = 1, 2, . . . , n. We say that the set X of intervals is laminar if for any two 
intervals Jj and Ij in X, either /j fl Jj = or one is included in the other. See Figure |4] for an 
illustration. 



Figure 4: Illustration of laminar intervals 



An ordering a respects the inclusions if for any two intervals /j and Ij with /j C Ij we 
have that i appears before j in a. 

Lemma 4. Let (X, /) be an instance of the interval ordering problem with X laminar. If the 
function g defined by g{x) = f{x) — /(O) is super- additive i.e., g{x + y) > g{x) + g{y) then 
any ordering that respects the inclusions is an optimal solution to (X, /) . 

Proof: Let a be an arbitrary order of optimal cost. We will show that there is another 
order respecting the inclusions and having a cost not greater than that of a. 

Suppose that a does not respect the inclusions. Then there is a pair i,j with C Ij and 
j appears before i in a. Let a' be the result of placing j right after i in the order a. Let x 
be the length of the exposed part of Ij in a', and y be the length of the exposed part of /j 
in a'. Then x -\- y is the length of the exposed part of Ij in a. Therefore the contribution 
of li and Ij to the cost of a is /(x + ?/) + /(O) while their contribution to the cost of a' is 
f{x) + f{y). 

Since g is super-additive, it follows that f{x + y) + /(O) > f{x) + f{y)- We conclude 
that the cost of a' is not more than the cost of a. By repeating the argument, we eventually 
obtain an inclusion respecting order with optimal cost. □ 

An inclusion respecting order can be found simply by sorting the intervals in increasing 
order of their lengths, breaking ties arbitrarily. 
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Theorem 3. The interval ordering problem (X, /) with X laminar and f such that the func- 
tion g{x) = f{x) — /(O) is super- additive, can be solved in 0{nlogn) time 

Proof: Immediate. n 

We show in Section |4] that the time complexity of any exact algorithm for solving this 
problem cannot be better than the time complexity of algorithms for sorting. Thus, when 
restricting ourselves to comparison-based algorithms, the bound in Theorem [3] is the best 
possible (see (?]). 

Remark 4. Notice that the problem (X, /) with X laminar and f such that the function 
g{x) = f{x) — /(O) is sub-additive, can also be solved in 0(n log n) time by sorting the 
intervals in decreasing order of their lengths. 

3.3 Bottleneck variant of the interval ordering problem 

In this subsection, we consider the bottleneck variant of the interval ordering problem. Re- 
ferring to the application described in Section [2| instead of looking for the exact complexity 
^(Sr=i2^0 of the BruteForce search algorithm, we focus on the maximum power of two 
that dominates this complexity. Hence, solving the bottleneck variant gives us a solution 
that is an approximation of the optimal solution to the interval ordering problem. The 
bottleneck variant is explicitly defined as follows. 

Definition 3. The Bottleneck Interval Ordering Problem (BIO): Given a function 
f and a set X = {/i, . . . ,/„} of intervals over the real line, find an ordering a G S„ that 
minimizes the value 

^max^ f{\Ia{k) \ U jlj^o) I) ■ 

A greedy algorithm for this variant would iteratively select the interval with the smallest 
exposed part. A formal description is given in Algorithm [2j In the rest of this section we 

Algorithm 2 Smallest Exposed Part Algorithm 

1: for every i = 1, . . . ,n, let I'^ := be the exposed part of the i-th interval 

2: Let S = {1, . . . ,n} be the set of yet unselected intervals 

3: for j = 1, . . . ,n do 

4: Identify i G S such that is minimal. 

5: a{j) := i 

6: S := S\{i} 

7: forkeS do 

8: update I', := 

9: end for 

10: end for 

11: Return a 



prove that Algorithm [2] solves instances of BIO when the cost function / is non-decreasing. 
Our proof is based on the following lemmas. 
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Lemma 5. Let {l,f) be an instance of BIO with a non- decreasing cost-function f. There 
exists an optimal solution to (Z, /) starting with a smallest interval. 

Proof: We prove this result by contradiction. Let (X, /) be an instance of BIO with a non- 
decreasing cost-function /. Assume that each optimal sequence to (X, /) does not start with 
a smallest interval. Consider an optimal sequence a = (a(l), . . . , ^(^o), • • • , a(^)) to (X, /) 
with the corresponding optimal value val{a). Clearly, val{a) > Let Ia{io) be the 

first smallest interval in a, i.e., |/a(io)| < \Ia{j) \ for all j E {1, . . . ,n} and |/a(jo)| < \Ia(j)\ for 
all j G {1, . . . , zq — !}• Consider now the sequence a' = [a{io), a{l), a{n)) where 
a{io) is move to the first position in a. It is clear that this move only affects the intervals 
that were sequenced before Ia(io) in ct- Further, since / is non-decreasing, and the length 
of each affected interval cannot have become larger, and |/a(io)| < we conclude that 

the objective value achieved by a' does not exceed val{a). Therefore, a' is also an optimal 
sequence to (X, /) , which is a contradiction. □ 

Given an arbitrary instance (X, /) of BIO with n intervals and /jg G X a smallest interval, 
we define the /j(,-reduced instance (Xj^, /) with n — 1 intervals as follows. For any interval 
Ij G X, 

1. if Ij 7^ and Ij fl Jj,, = then Ij G Xj,,; 

2. if Ij ^ li^ and Ij n 4, ^ then Ij \ G 

Furthermore, the real line is adapted accordingly such that Ij\Iio is an interval for all j ^ iq. 

Lemma 6. Let (X, /) be an instance of BIO with a non- decreasing cost-function f. Let 
I^^ & X be a smallest interval and be an optimal sequence to (Xj^,/). Then (^octio) 
an optimal sequence to (X, /) . 

Proof: Let val{a^^) be the total cost of the optimal solution to (Xj^, /). 

1. If /(|/iol) > val^aio) then (zq, «jo) clearly an optimal sequence to (X, /) (recall that 
Jjp is a smallest interval). 

2. On the other hand, suppose that /(|/iol) < val{aiQ). Lemma [s] implies that there exists 
an optimal sequence a' to (X, /) starting with Jj^. After selecting Jj^, the instance that 
remains is (Xjg,/), for which is optimal. Therefore, (20,^10) is an optimal solution 
to(X,/). □ 



Theorem 5. The Bottleneck Interval Ordering problem (X, /) with X arbitrary and f non- 
decreasing can be solved in 0{n'^). 

Proof: This result follows from Lemma [5] and Lemma |6l □ 

Remark 6. Notice that if the function f is non-increasing then the instances of BIO with 
this cost function can be solved with an 0{v?)-time algorithm similar to Algorithm^ where 
in line 6 instead of taking the interval with the smallest exposed part, we take the interval 
with the longest exposed part. 



12 



4 Complexity results 



This section presents a number of negative results on the computational complexity of the 
interval ordering problem. Our first result shows that even the easy special cases discussed 



in Section 3.1 are not completely straightforward, and shows the optimality of the algorithm 



given in Section 3.2 



Theorem 7. The interval ordering problem is at least as hard as the SORTING problem, 
even if (a) the intervals are agreeable, or if (b) the intervals form a laminar set. Conse- 
quently, every comparison-based algorithm for these special cases will have a time complexity 
VL{n Inn). 

Proof: Let Xi, . . . ,a;„ be an arbitrary sequence of positive real numbers that form an in- 
stance of the SORTING problem. We construct a corresponding instance of the interval 
ordering problem that consists of the intervals Ij = [0, Xj) for j = 1, . . . ,n, together with the 
cost function f{x) = 2^'. Note that this set of intervals is agreeable and laminar. 

Note that the cost function /(x) is such that g{x) = f{x) — /(O) is super-additive on 
the positive real numbers. This observation easily yields that the optimal ordering of the 
intervals must sequence them by increasing right endpoint, and hence induces a solution to 
the SORTING problem. □ 

Next, we will discuss the computational complexity of the interval ordering problem. We 
will show that there is little hope for finding a polynomial-time algorithm for solving the 
interval ordering problem in general. The reduction is from the following variant of the 
NP-hard PARTITION problem Isj problem SP12]. 



Instance: A finite set {gi, q2, . . . , Qn} of n positive integers with sum 2Q; an integer k. 
Question: Does there exists an index set J C {1, . . . , n} with \J\ = k, such that Yljej 



Lemma 7. Let I be an instance of PARTITION and N be an integer such that 2 > 
2"(5 + k. Then there exists an instance (X, /) of the interval ordering problem that can be 
built from I in polynomial time and that satisfies the following conditions: 

(i) If I is a YES-instance of PARTITION, then the total cost of an optimal sequence to 
(X, /) is at most 2"Q + n — k. 

(a) If I is a NO-instance of PARTITION, then the total cost of an optimal sequence to 
(X, /) zs at least 2^ + 2"Q + n-k. 

Proof: Consider an arbitrary instance I of PARTITION. We build the instance (X, /) of 
the interval ordering problem as follows. The cost function / : N — ?■ N is defined by f{x) = 
if X is a power of two, and by f{x) = x otherwise. The set X consists of n + 2 intervals. 
First, for i = 1, . . . ,n there is a so-called element-interval of length ii = 2"gj -\- 1. These 
element-intervals are pairwise disjoint and put back to back, so that they jointly cover the 
interval from to L := ^"^j^^j = 2"'+^(5 + ^- Secondly, there is a so-called dummy-interval 
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of length in+i = 2 — 2"Q — k that goes from L to L + in+i- Thirdly, there is the so-called 
main-interval that covers all other intervals, and that goes from to L + in+i] hence the 
length of the main-interval is 2^ -|- 2"'Q + n — k. Clearly, this construction of (X, /) can be 
done in polynomial time. Next, we prove (i) and (ii). 

(i) Assume that / is a YES-instance of PARTITION, and let J C {l,...,n} be the 
corresponding index set. First select the element-intervals that correspond to the with 
i ^ J, then the main-interval, followed by the remaining element-intervals, and finally the 
dummy-interval. For the first batch of element-intervals we pay a cost of + n — k. The 
exposed part of the main-interval then has length 2^, which yields a cost of 0. This reduces 
the exposed part of all remaining intervals down to length 0. The overall total cost is then 
2"g + n - k. 

(a) Now assume that J is a NO-instance of PARTITION. We claim that in this case 
no sequencing can ever turn the length of the exposed part of the main-interval into a 
power of 2. Then the total cost is proportional to the total length, and hence at least 
2^ _l_ 2"-Q -\- n — k. It remains to prove the claim. We distinguish two cases. The first case 
deals with the time before the dummy-interval is sequenced. At such a point in time the 
length of the exposed part of the main-interval equals the length of the dummy-interval plus 
the length of the currently unsequenced element-intervals. The length of the dummy-interval 
is 2^ — 2"'Q — k > 2^~^. The length of the dummy interval plus the length of all element- 
intervals is 2^ + 2^Q + n — k < 2^^^. Hence the only candidate power of 2 would be 2^ . 
But in this case the subset of the element-intervals would have a total length of 2"'Q -\-n — k, 
which would correspond to a solution to the PARTITION instance /; a contradiction. The 
second case deals with the time after the dummy-interval has been sequenced. At such a 
point in time the length of the exposed part of the main-interval equals the length of the 
remaining unsequenced element-intervals. But the total length of such a subset can never 
be a power of 2. □ 

Of course. Lemma [7] immediately yields the NP-hardness of the interval ordering problem. 
We will also derive from it the inapproximability of this problem. Suppose for the sake of 
contradiction that there is a polynomial-time approximation algorithm with some finite 
worst-case guarantee 6. Pick an arbitrary instance / of PARTITION, and choose an integer 
sufficiently large so that 

2^ > {9-l){2''Q + n-k). (7) 

Then is roughly n-l-logQ-l-log^, and hence its length is polynomially bounded in the size 
of instance /. We construct the instance (l, /) of the interval ordering problem as indicated 
in the proof of Lemma [7| and feed it into the approximation algorithm. The answer allows 
us to decide in polynomial time whether instance J is a YES-instance or a NO-instance of 
PARTITION. 

Theorem 8. The interval ordering problem is NP-hard. Furthermore, the interval ordering 
problem does not possess any polynomial-time approximation algorithm with constant worst- 
case guarantee, unless P = NP. 
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5 Conclusion 



This paper studies the problem of ordering a given set of intervals on the real line to minimize 
the total cost, where the cost incurred for an interval depends on the length of its exposed 
part when it is processed. We were motivated to consider this problem by an application in 
molecular biology. Our work proposes polynomial-time algorithms for some special cases of 
the problem. Furthermore, we prove that the problem is NP-hard and is unlikely to have a 
constant-factor-approximation algorithm. 

Some interesting special cases of our problem remain open. For instance, when the cost 
function is continuous and convex, (and without any assumption on the structure of the 
intervals), the complexity of the problem is not settled. In particular, the case f{x) = 
2^ is interesting (note that our NP-hardness construction docs not yield anything for this 
particular cost function). Finally, it would be interesting to see other special cases that can 
be solved in polynomial time. 
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